Data Description

orders (3.4m rows, 206k users):
* order_id: order identifier
* user_id: customer identifier
* eval_set: which evaluation set this order belongs in (see SET described below)
* order_number: the order sequence number for this user (1 = first, n = nth)
* order_dow: the day of the week the order was placed on
* order_hour_of_day: the hour of the day the order was placed on
* days_since_prior: days since the last order, capped at 30 (with NAs for order_number = 1)

products (50k rows):
* product_id: product identifier
* product_name: name of the product
* aisle_id: foreign key
* department_id: foreign key

aisles (134 rows):
* aisle_id: aisle identifier
* aisle: the name of the aisle

departments (21 rows):
* department_id: department identifier
* department: the name of the department

order_products__SET (30m+ rows):
* order_id: foreign key
* product_id: foreign key
* add_to_cart_order: order in which each product was added to cart
* reordered: 1 if this product has been ordered by this user in the past, 0 otherwise

where SET is one of the four following evaluation sets (eval_set in orders):
* "prior": orders prior to that users most recent order (~3.2m orders)
* "train": training data supplied to participants (~131k orders)
* "test": test data reserved for machine learning competitions (~75k orders)

#### Table 1 - aisles

#> [1] 0
The aisles table
aisle_id aisle
1 prepared soups salads
2 specialty cheeses
3 energy granola bars
4 instant foods
5 marinades meat preparation
6 other
7 packaged meat
8 bakery desserts
9 pasta sauce
10 kitchen supplies
11 cold flu allergy
12 fresh pasta
13 prepared meals
14 tofu meat alternatives
15 packaged seafood
16 fresh herbs
17 baking ingredients
18 bulk dried fruits vegetables
19 oils vinegars
20 oral hygiene
21 packaged cheese
22 hair care
23 popcorn jerky
24 fresh fruits
25 soap
26 coffee
27 beers coolers
28 red wines
29 honeys syrups nectars
30 latino foods
31 refrigerated
32 packaged produce
33 kosher foods
34 frozen meat seafood
35 poultry counter
36 butter
37 ice cream ice
38 frozen meals
39 seafood counter
40 dog food care
41 cat food care
42 frozen vegan vegetarian
43 buns rolls
44 eye ear care
45 candy chocolate
46 mint gum
47 vitamins supplements
48 breakfast bars pastries
49 packaged poultry
50 fruit vegetable snacks
51 preserved dips spreads
52 frozen breakfast
53 cream
54 paper goods
55 shave needs
56 diapers wipes
57 granola
58 frozen breads doughs
59 canned meals beans
60 trash bags liners
61 cookies cakes
62 white wines
63 grains rice dried goods
64 energy sports drinks
65 protein meal replacements
66 asian foods
67 fresh dips tapenades
68 bulk grains rice dried goods
69 soup broth bouillon
70 digestion
71 refrigerated pudding desserts
72 condiments
73 facial care
74 dish detergents
75 laundry
76 indian foods
77 soft drinks
78 crackers
79 frozen pizza
80 deodorants
81 canned jarred vegetables
82 baby accessories
83 fresh vegetables
84 milk
85 food storage
86 eggs
87 more household
88 spreads
89 salad dressing toppings
90 cocoa drink mixes
91 soy lactosefree
92 baby food formula
93 breakfast bakery
94 tea
95 canned meat seafood
96 lunch meat
97 baking supplies decor
98 juice nectars
99 canned fruit applesauce
100 missing
101 air fresheners candles
102 baby bath body care
103 ice cream toppings
104 spices seasonings
105 doughs gelatins bake mixes
106 hot dogs bacon sausage
107 chips pretzels
108 other creams cheeses
109 skin care
110 pickled goods olives
111 plates bowls cups flatware
112 bread
113 frozen juice
114 cleaning products
115 water seltzer sparkling water
116 frozen produce
117 nuts seeds dried fruit
118 first aid
119 frozen dessert
120 yogurt
121 cereal
122 meat counter
123 packaged vegetables fruits
124 spirits
125 trail mix snack mix
126 feminine care
127 body lotions soap
128 tortillas flat bread
129 frozen appetizers sides
130 hot cereal pancake mixes
131 dry pasta
132 beauty
133 muscles joints pain relief
134 specialty wines champagnes

Table 2 - departments

#> [1] 0
The departments table
department_id department
1 frozen
2 other
3 bakery
4 produce
5 alcohol
6 international
7 beverages
8 pets
9 dry goods pasta
10 bulk
11 personal care
12 meat seafood
13 pantry
14 breakfast
15 canned goods
16 dairy eggs
17 household
18 babies
19 snacks
20 deli
21 missing

Table 3 - products

#> [1] 0
The products table
product_id product_name aisle_id department_id
1 Chocolate Sandwich Cookies 61 19
2 All-Seasons Salt 104 13
3 Robust Golden Unsweetened Oolong Tea 94 7
4 Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce 38 1
5 Green Chile Anytime Sauce 5 13
6 Dry Nose Oil 11 11
7 Pure Coconut Water With Orange 98 7
8 Cut Russet Potatoes Steam N’ Mash 116 1
9 Light Strawberry Blueberry Yogurt 120 16
10 Sparkling Orange Juice & Prickly Pear Beverage 115 7
11 Peach Mango Juice 31 7
12 Chocolate Fudge Layer Cake 119 1
13 Saline Nasal Mist 11 11
14 Fresh Scent Dishwasher Cleaner 74 17
15 Overnight Diapers Size 6 56 18
16 Mint Chocolate Flavored Syrup 103 19
17 Rendered Duck Fat 35 12
18 Pizza for One Suprema Frozen Pizza 79 1
19 Gluten Free Quinoa Three Cheese & Mushroom Blend 63 9
20 Pomegranate Cranberry & Aloe Vera Enrich Drink 98 7
21 Small & Medium Dental Dog Treats 40 8
22 Fresh Breath Oral Rinse Mild Mint 20 11
23 Organic Turkey Burgers 49 12
24 Tri-Vi-Sol® Vitamins A-C-and D Supplement Drops for Infants 47 11
25 Salted Caramel Lean Protein & Fiber Bar 3 19
26 Fancy Feast Trout Feast Flaked Wet Cat Food 41 8
27 Complete Spring Water Foaming Antibacterial Hand Wash 127 11
28 Wheat Chex Cereal 121 14
29 Fresh Cut Golden Sweet No Salt Added Whole Kernel Corn 81 15
30 Three Cheese Ziti, Marinara with Meatballs 38 1
31 White Pearl Onions 123 4
32 Nacho Cheese White Bean Chips 107 19
33 Organic Spaghetti Style Pasta 131 9
34 Peanut Butter Cereal 121 14
35 Italian Herb Porcini Mushrooms Chicken Sausage 106 12
36 Traditional Lasagna with Meat Sauce Savory Italian Recipes 38 1
37 Noodle Soup Mix With Chicken Broth 69 15
38 Ultra Antibacterial Dish Liquid 100 21
39 Daily Tangerine Citrus Flavored Beverage 64 7
40 Beef Hot Links Beef Smoked Sausage With Chile Peppers 106 12
41 Organic Sourdough Einkorn Crackers Rosemary 78 19
42 Biotin 1000 mcg 47 11
43 Organic Clementines 123 4
44 Sparkling Raspberry Seltzer 115 7
45 European Cucumber 83 4
46 Raisin Cinnamon Bagels 5 count 58 1
47 Onion Flavor Organic Roasted Seaweed Snack 66 6
48 School Glue, Washable, No Run 87 17
49 Vegetarian Grain Meat Sausages Italian - 4 CT 14 20
50 Pumpkin Muffin Mix 105 13

Table 4 - order_products_train

#> [1] 0
The order_products_train table
order_id product_id add_to_cart_order reordered
1 49302 1 1
1 11109 2 1
1 10246 3 0
1 49683 4 0
1 43633 5 1
1 13176 6 0
1 47209 7 0
1 22035 8 1
36 39612 1 0
36 19660 2 1
36 49235 3 0
36 43086 4 1
36 46620 5 1
36 34497 6 1
36 48679 7 1
36 46979 8 1
38 11913 1 0
38 18159 2 0
38 4461 3 0
38 21616 4 1
38 23622 5 0
38 32433 6 0
38 28842 7 0
38 42625 8 0
38 39693 9 0
96 20574 1 1
96 30391 2 0
96 40706 3 1
96 25610 4 0
96 27966 5 1
96 24489 6 1
96 39275 7 1
98 8859 1 1
98 19731 2 1
98 43654 3 1
98 13176 4 1
98 4357 5 1
98 37664 6 1
98 34065 7 1
98 35951 8 1
98 43560 9 1
98 9896 10 1
98 27509 11 1
98 15455 12 1
98 27966 13 1
98 47601 14 1
98 40396 15 1
98 35042 16 1
98 40986 17 1
98 1939 18 1

**Table 5 - order_products__prior**

#> [1] 0
The order_products_prior table
order_id product_id add_to_cart_order reordered
2 33120 1 1
2 28985 2 1
2 9327 3 0
2 45918 4 1
2 30035 5 0
2 17794 6 1
2 40141 7 1
2 1819 8 1
2 43668 9 0
3 33754 1 1
3 24838 2 1
3 17704 3 1
3 21903 4 1
3 17668 5 1
3 46667 6 1
3 17461 7 1
3 32665 8 1
4 46842 1 0
4 26434 2 1
4 39758 3 1
4 27761 4 1
4 10054 5 1
4 21351 6 1
4 22598 7 1
4 34862 8 1
4 40285 9 1
4 17616 10 1
4 25146 11 1
4 32645 12 1
4 41276 13 1
5 13176 1 1
5 15005 2 1
5 47329 3 1
5 27966 4 1
5 23909 5 1
5 48370 6 1
5 13245 7 1
5 9633 8 1
5 27360 9 1
5 6348 10 1
5 40878 11 1
5 6184 12 1
5 48002 13 1
5 20914 14 1
5 37011 15 1
5 12962 16 1
5 45698 17 1
5 24773 18 1
5 18569 19 1
5 41176 20 1

Table 6 - order

#> [1] 206209
#> [1] 206209

We can observe on the first chart days_since_prior_order that most of the users have a higher probability to do another purchase order after a week from the previous purchase. Also, we can visualize on the graph oder_dow that the most frequent days of ordering are Sunday’s and Monday’s comparing to the rest of the week, and on the last chart order_hour_of_day,we note a high demand of orders between 9am to 6pm.

Table 7 - user_purchases

#> [1] Inf

####Mandy comment -> Still keeping top 10 reordered products, top 10 reordered product by aisle. Just to see the overall.

Top 10 reordered products

As the result from the table, we can see that fresh fruits and packaged vegetables fruits under produce department are the most reordered products.
The top 10 reordered products
product_id product_name aisle department total_reorder total_order percentage_reorder
24852 Banana fresh fruits produce 415166 491291 84.5
13176 Bag of Organic Bananas fresh fruits produce 329275 394930 83.4
21137 Organic Strawberries fresh fruits produce 214448 275577 77.8
21903 Organic Baby Spinach packaged vegetables fruits produce 194939 251705 77.4
47209 Organic Hass Avocado fresh fruits produce 176173 220877 79.8
47766 Organic Avocado fresh fruits produce 140270 184224 76.1
27845 Organic Whole Milk milk dairy eggs 118684 142813 83.1
47626 Large Lemon fresh fruits produce 112178 160792 69.8
27966 Organic Raspberries packaged vegetables fruits produce 109688 142603 76.9
16797 Strawberries fresh fruits produce 104588 149445 70.0

Overall, approximately 60% of the total orders are reordered products.

Top 10 reordered products by aisle

The top 10 reordered products by aisle
aisle department total_reorder total_order percentage_reorder
fresh fruits produce 2726251 3792661 71.9
fresh vegetables produce 2123540 3568630 59.5
packaged vegetables fruits produce 1178700 1843806 63.9
yogurt dairy eggs 1034957 1507583 68.7
milk dairy eggs 722128 923659 78.2
water seltzer sparkling water beverages 640988 878150 73.0
packaged cheese dairy eggs 598280 1021462 58.6
soy lactosefree dairy eggs 460069 664493 69.2
chips pretzels snacks 444036 753739 58.9
bread bakery 408010 608469 67.1
Top 10 reordered products by department ########Mandy
The top 10 reordered products by department
department total_reorder total_order percentage_reorder
produce 6432596 9888378 65.1
dairy eggs 3773723 5631067 67.0
beverages 1832952 2804175 65.4
snacks 1727075 3006412 57.4
frozen 1268058 2336858 54.3
bakery 769880 1225181 62.8
pantry 679799 1956819 34.7
deli 666231 1095540 60.8
canned goods 511317 1114857 45.9
meat seafood 420349 739238 56.9

Sales Patterns

Here, we would like to observe the pattern of sales in depth by spilting into departments. Frist, it is the pattern of weekly sales.


From these graphs, we could observe the patterns as follow:

  1. Although in the graph shown at the beginning illustrates that the peak of purchase usually is on Sunday and Monday, we can see alcohol is the exception here. The figure increases slight from Sunday and reach the top on Friday, then decrease sharply on Saturday.

  2. Beverage and snacks have the entirely same pattern, they most of the orders happens on Sunday and Monday, but the number of orders reach to the peak on Monday and have a decrease trend before Friday. Friday is the third peak during the whole week. The least number is on Saturday.

  3. The rest departments of orders have similar pattern. The figures decrease on the top from Sunday and Monday, then rebound slightly on Saturday.


In terms of the order time of a day, we could observe from the graphs above that for every department, the highest number of the order happen between the period of 9am and 16pm, and they have slight fluctuation during the period. Yet, the department we could pay more attention to is the babies’ products. Although it has the similar pattern, it fluctuate more than the others.


########Mandy Add pca in eda slide 23 -> what variable important to the outcome
Summary from TA -Supervised and unsupervised can forecast reordered products -Supervised = Tree SVM -Unsupervised = clustering PCA

Our output: 1. To forecast how many days are there between the prior orders and the recorded orders –> Clustering, PCA 2. Reordered products 3. Association between products: PCA and clustering in class